{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <center>RegEx in Python</center>\n",
    "\n",
    "![](images/memes/meme25.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Backreferencing\n",
    "\n",
    "**Backreferences** in a pattern allow you to specify that the contents of an earlier capturing group must also be found at the current location in the string. \n",
    "\n",
    "> For example, `\\1` will succeed if the exact contents of group `1` can be found at the current position, and fails otherwise.\n",
    "\n",
    "### Example 1\n",
    "\n",
    "Consider a scenario where we want to find all the duplicated words in the given text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "txt = \"\"\"\n",
    "hello hello\n",
    "how are you\n",
    "bye bye\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"(\\w+) \\\\1\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['hello', 'bye']"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pattern.findall(txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Since Python’s string literals also use a **backslash followed by numbers** to allow including arbitrary characters in a string, backreferences need to be **escaped** so that regex engine gets proper format. We can also use **raw strings** to ignore escaping.\n",
    "\n",
    "Here is an example using raw strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(r\"(\\w+) \\1\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['hello', 'bye']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pattern.findall(txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example 2\n",
    "\n",
    "Consider a scenario where we want to find all dates with the format `dd/mm/yyy` and change them to `yyyy-mm-dd` format. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "txt = \"\"\"\n",
    "today is 23/02/2019.\n",
    "yesterday was 22/02/2019.\n",
    "tomorrow is 24/02/2019.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern = re.compile(\"(\\d{2})\\/(\\d{2})\\/(\\d{4})\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "newtxt = pattern.sub(r\"\\3-\\2-\\1\", txt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "today is 2019-02-23.\n",
      "yesterday was 2019-02-22.\n",
      "tomorrow is 2019-02-24.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(newtxt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Backreferences, too, cannot be used inside a character class. The `\\1` in a regex like `(a)[\\1b]` is either an error or a needlessly escaped literal 1. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](images/memes/meme26.jpg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
